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Abstract 

An optimization based state and parameter estimation method is presented where the required Jacobian matrix of the 
cost function is computed via automatic differentiation. Automatic differentiation evaluates the programming code of 
the cost function and provides exact values of the derivatives. In contrast to numerical differentiation it is not suffering 
from approximation errors and compared to symbolic differentiation it is more convenient to use, because no closed 
analytic expressions are required. Furthermore, we demonstrate how to generalize the parameter estimation scheme 
to delay differential equations, where estimating the delay time requires attention. 
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1. Introduction 

For many processes in physics or other fields of science mathematical models exist (in terms of differential equa¬ 
tions, for example), but not all state variables are easily accessible (measurable) and proper values of model parame¬ 
ters may be (partly) unknown. In particular, detailled biological cell models (e.g., cardiac myocytes |[T]) may include 
many variables which are difficult to access experimentally and, in addition, depend on up to hundreds of physical 
parameters whose values have to be determined. To estimate unobserved variables (as a function of time) and model 
parameters different identification methods have been devised EUIaI [T6l420l . These methods have in common that 
an attempt is made to adjust the model output (in general a function of the state variables) to some (experimentally) 
observed time series. To achieve agreement, unobserved variables and unknown model parameters are suitably ad¬ 
justed such that the model reproduces and follows the observed time series. In geosciences and meteorology (e.g., 
whether forecasting) this procedure is often called data assimilation and describes the process of incorporating new 
(incoming) data into a computer model of the real system. 

A general framework for state estimation provides, for example, the path integral formalism including a saddle 
point approximation nsKH. This formalism can be used to state the estimation problem as an optimization problem 
inHiiiiiiiii. If an optimization method is employed that is based on gradient descent (such as the well-known 
Levenberg-Marquard method 121] |22), in general the Jacobian matrix of the cost function has to be provided, whose 
derivation may be quite cumbersome (and error-prone), depending on the structure of the cost function and the un¬ 
derlying mathematical model of the dynamical system. To estimate the Jacobian matrix one may approximate it by 
numerical derivatives (often spoiled by unacceptably large truncation errors) or use symbolic mathematics, which 
requires, however, that the function to be derived has to be given in closed form. 

A convenient alternative to both of these methods is automatic differentiation Il24ll where exact numerical values 
of the required derivatives are computed by analyzing a given source code implementation of the cost function. As 
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will be shown here automatic differentiation leads in this context not only to a very flexible and efficient algorithm for 
computing the required Jacobian but also provides the sparsity pattern of the Jacobian which is exploited by suitable 
optimization methods. In Section|^we will give a formal description of the optimization problem to be solved for state 
and parameter estimation. Then we briefly present in Section]^ the concept of automatic differentiation in the form 
used here. As an illustrative example we show in Section]^ how to estimate the model parameters of the Lorenz-96 
model. In Sectionj^we discuss how to estimate the delay time in delay differential equations and provide in Section 
l^an example (Mackey-Glass model). 

2. State and parameter estimation method 

The method used here to adapt a model to a time series is based on minimizing a cost function and was introduced 
in Ref. ||T9l . For completeness we present in the following an extended version covering also delay differential 
equations (DDEs). 

We assume that a multivariate R-dimensional time series {//(«)) is given consisting of A -i- 1 samples J/(n)=i/(fn) e 
K* measured at times T = {f„ = n ■ Af | n = 0,1, • ■ •, A). For simplicity the observation times f„ are equally spaced 
(with a fixed time step Af) and start at to - 0. The estimation method can easily be extended to nonuniformly sampled 
observations (see Ref. mi). Here we consider the general case of a model given by a set of coupled delay differential 
equations (DDE) 


^ F(y(t),y^(t),p,t), (1) 

with y^(t) = y(t - r). The state vector(s) y(t) = (yi(0> • ■ ■ the delay parameter t 6 K and the U model 

parameters p = (pi,... ,pu)^ are unknown and have to be estimated from the time series {»/(«)). Estimating t can 
not be conducted as estimating p, because F(y(t),y^(t), p, t) does not explicitly depend on r. In fact F(y{t),y^{t), p, t) 
depends on y^{t) which is a function of t. We shall later come back to this topic. 

Note that ([T]) also describes (as a special case) models given by coupled ordinary differential equations (ODEs). 
In this case the right-hand side of Q is independent of y^it) and thus can be replaced by F(y{t),p, t) (see Ref. mi 
for details). 

To estimate the unknown quantities a measurement function 

z{t) ^ h(y(t),q,t) (2) 

is required to represent the relation between model states y{t) and the z(f) corresponding to the observations {i/(n)). 
This measurement function may contain V additional unknown parameters q - {q\,... ,qv)^ that also have to be 
estimated using information from the given time series {//(«)). 

2.1. Cost function 

The goal of the estimation process is to find a set of values for all unknown quantities such that the model equations 
provide via measurement function (|^ a model times series {z(f„)) that matches the experimental time series {»/(?„)). In 
other words, the average difference between //(f„) and z(f„) should be small. Furthermore, the model equations should 
be fulfilled as well as possible. This means that modeling errors u{t) are allowed, but should be small. Therefore, 
model Q is extended to 

= F(y(t),yft),p, t) + u(t). (3) 

The smaller u(t) is the better the model equations Q are fulfilled. Next, for simplicity, u{t) and y(t) will be 
discretized at the times in T. This means that y{t) will be sampled at the same time when data are observed. With 
y(n)-y{n ■ Af)=j(f„) and }/(a,b) — {y(n) \ n - a,a + I,... ,b} the set of values of the discretized model variables 
can be written as d/(0,A). The quantities in J/(0,A) have to be estimated in addition to p and q. With the same 
discretization we have {«(«)) = {«(?«)). At this point we assume a fixed (not to be estimated) delay t - k ■ At with 
k e N which is not necessarily equal to the delay parameter of the physical process underlying the data. This simplifies 
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the discretization of the delayed variable to y^{t) - y(n ■ At -k - At) — y((n - k) ■ At) - y(n -k) — yj^in). The set of the 
discretized delayed variable is then = ^(—k,N-k). Notethat A^-A:) = }fi-k, — \)VJ}f{0,N-k). Since 

^{Q,N-k) c J/(0, A^), A/{Q,N-k) contains no additional quantities to be determined. Only the variables in ^{—k, -1) 
are additional quantities which have to be estimated. Typically the delay time is much shorter than the length of the 
time series N ■ At and hence the number of elements in ^{-k,-\) is much smaller than in Af{0,N). Therefore the 
number of quantities to be estimated does not increase much compared to a model given by ODEs (with similar D 
and N) where Afi-k, -1) has not to be estimated. 

The discretization of (|^ is then given by 


u{n) ss 


At r, 


-F(j{n),y^(n),p,t„), 


(4) 


whereas the symbol stands for the finite difference approximation of at time f„. The goal of the adaption 

process is to minimize (on average) the norm of u{n) and the norm of the difference t]{t„) - zit,,)- 
This leads to a cost function 


C0(-k,N),p,q) ^Ci +C2 + C3+C4 


(5) 


with 


N 

Cl = ^ ^ (i/(n) - z(n))'^ A {t]{n) - z{n)) 

I — a j 

Cl = —^ ■ 2^ u{n)^Bu(n) 

^ n=0 

1 T 

C 3 = ■ Yj (->'apr(«) - 3'(«))^ E (3'^p,(n) - y(n)) 

rt=3 

B y 

C4 = - ■ ^f(H’,W’i,H’u) ■q{w,w\,Wa). 


(6) 

(7) 

(8) 
(9) 


Cl penalizes the difference between q{n) and z(n) whereas Ci penalizes large magnitudes of u{n). A, B, and E are 
weight matrices that will be specified later. At the minimum of Q the solution 0{-k,N),p, q) is obtained which is 
considered as the solution of the estimation problem. In the term C 3 a Hermite interpolation is performed to determine 
3'apr(”) from neighboring points and the time derivatives which are, according to Q, given by 


G{y{t),y2t),p,t) = F(yit),y2t),p,t) + u{t). (10) 

With Eq. ( [TOl ) the Hermite interpolation reads 

Japr(«) b(« - 2) +3'(n + 2)] + ^ [yin - 1) + yin + 1)] 

+ ^ [Giyin - 2),y2n - 2),p, tn-i) - Giyin + 2), j^(n + 2),p, tn+i)] 
io 

4Af 

+ [Giyin - l),yk(n- l),p,t„-i)-Giyin+ l),j*(n + l),/;,f„+i)] . (11) 

Smoothness of J/(0, A) is enforced by small differences ^aprC”) “ y(n)- The term C 3 suppresses non-smooth 
(oscillating) solutions which may occur without this term in the cost function. Let 


w - ii}/iQ,N),i}/i-k,-l),p,q) 

= iifi-k,N),p,q) 

= (wi,...,Wz.) (12) 
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be a vector containing all quantities to be estimatedAgain, if the model is given by ODEs, does not 

occur in w. Hence for ODEs we obtain 


w^mO,N),p,q). (13) 

To force w to stay between the lower and upper bounds Wi and Wu, respectively, q(w, Wu Wu) - {qi,..., qi)^ is defined 
as 


Wu,i - Wi 

for 

Wi 

> Wu,i 

0 

for 

Wl„ 

i < Wi < Wu, 

Wl,i - Wi 

for 

Wi 

VI 


(14) 


qi is zero if the value of w, lies within its bounds. To enforce this, the positive parameter p is set to a large number, e.g. 
10^. In this paper the matrices A, B and E are diagonal matrices. The diagonal elements can be used for an individual 
weighting. 

The homotopy parameter a can be used to adjust whether the solution should be close to data (a « 1) or have 
a smaller error in fulfilling the model equations (see Ref. mi). In II 20 II a possible technique is described to find an 
optimal a. Eurthermore one might use continuation (see Ref. mi) where a is stepwise decreased. Starting with a ^ \ 
results in a solution close to the data. Then, a is slightly decreased and the previously obtained solution is used as an 
initial guess and the cost function is be optimized again. This procedure is repeated until the value a - 0.5 is reached. 

Note that the cost function can be written in the form 


j 

C(tv) = ^//»2 = ||H(H-)||2 (15) 

>=i 

where H{w) is a high dimensional vector valued function of the high dimensional vector w. To optimize ( [TSl l we 
use an implementation of the Levenberg-Marquardt algorithm Il2ni22]| called sparseLM f23\ . Although C(w’) will be 
optimized, sparseLM requires H{w) and the sparse Jacobian of H{w) as input. In the next section we discuss how to 
derive the Jacobian and its sparsity structure. 


3. Automatic differentiation 

The technique used here to estimate the variables and the parameters of a model from time series is based on 
minimizing the cost function Q which can be written in the form of Eq. ( [T5] l. The Levenberg-Marquardt algorithm 
used to minimize this cost function needs the vector valued function H(w) and its Jacobian dHjdw with the elements 
dHjIdwi as input. Remember that the Jacobian has a sparse structure, i.e. it has many elements which are always zero. 
To compute this sparse Jacobian there exist three different techniques; numerical differentiation, symbolic differenti¬ 
ation and automatic differentiation (symbolic differentiation includes differentiation by hand). These techniques have 
particular advantages and disadvantages. Numerical differentiation is easy to implement, but numerically not exact. 
Eurthermore, the sparsity pattern can not be detected reliably. Symbolic differentiation is numerically exact, but the 
function to be differentiated has to be available as a single expression. Deriving the Jacobian by hand usually is very 
error prone. Symbolic differentiation tools may help at this point, however, a change in the cost function requires de¬ 
riving a new Jacobian. As an alternative the concept of automatic differentiation can be used. It is easy to implement, 
numerically exact and the sparsity pattern of the Jacobian can be detected automatically. Only the source code of 
the cost function is required by the automatic differentiation tool. Using automatic differentiation requires additional 
computational resources. 

After weighing up the pros and cons of the discussed methods for computing the Jacobian we came to the con¬ 
clusion that the concept of automatic differentiation is the most suitable one. Automatic differentiation is used here 
in terms of the tool (library) ADOL-C ll25l l26l . ADOL-C provides functions to derive the numerical values of the 


'Here we assume that a fixed but arbitrary rule is used to order the elements of the sets N(0,N} and to define the elements of the 

vector w. 
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Example: 

Section 

Mackey-Glass model 

FI 

80-dim Lorenz96 system 

FI 

CPU time for: 



cost function calls 

L07s 

3L8s 

Jacobian calls (ADOL-C) 

2.34s 

155s 

optimizer (sparseLM) 

3.14s 

21845s 


Table 1: CPU times needed for evaluation of the values of the sparse Jacobian compared to the CPU time needed by the optimization routine. For 
the Mackey-Glass model only the CPU time for r e [2.3,2.4] from the second step of estimating the delay time is shown. The time values were 
measured on a 1.3GHz Dual core computer with 3GB memory. 


Jacobian dHjdw of the function H(w). Furthermore the sparsity pattern of dHjdw can be detected and the numerical 
values of the non-vanishing elements can be derived (this functionality requires the graph coloring package ColPack 

EH). 

We used the Python interface Pyadolc ED to ADOL-C wrapping functions of the ADOL-C library to Python. 
The advantage of using the Python interface instead of the C interface is that the cost function can be coded directly 
in Python using Numpy BOl arrays. ADOL-C is based on operator overloading. This means, that for computing the 
Jacobian the function to be differentiated has to be available as source code, only, with an input w and return H{w). 
For evaluating the cost function usually the elements of w have a numerical data type (e.g. integer, float, ...). 

Typically deriving the sparsity pattern takes much more time than computing the non-zero values. However, 
this is more or less negligible because the detection of the sparsity pattern only has to be performed once, whereas 
the computation of the values of the non-zero elements occurs several times (always when the Jacobian has to be 
computed for a certain input w, usually in each iteration of a numerical minimization routine). 

For the examples in Sections]^ andthe time needed by ADOL-C used for computing the Jacobian of Eq. ( [T5] l 
was compared with the time needed by the used minimization routine (sparseLM) and the results are shown in Tab. [T] 
It turned out that the time needed by ADOL-C is much shorter than the time needed by sparseLM, i.e. the time needed 
to evaluate the cost function is negligible. Hence using ADOL-C does not lead to a significant increase of CPU time 
needed to solve the estimation problem. 

4. Lorenz-96 model 

As described in Sectionj^the estimation method can be used to adapt a system of ODEs to a time series (without 
time delay). As our first example we use the Lorenz-96 model that was introduced by E. Lorenz in 1996 Il28l . Here 
the D = 80 dimensional system is used given by the set of ODEs 


^ ■ (y/+i(f) - yi-iit)) - yAt) + p (16) 

whereas i = 1,... ,D is a cyclic index. This means that for i - D it is = y^. Eor / = 1 it is / - 1 - D and 
i - 2 - D - 1. To compare the results from the estimation process a twin experiment is performed. The time series 
{t/(fn)) with t„ 6 {0,0.01,0.02,..., 10) is generated by integrating a similar model 


^ = Xi-iit) ■ (x^it) - Xi.2(t)) - x,(f) + 8.17 (17) 

df 

with the same dimension and cyclic index and taking the solution to build a noisy R - Djl - AO dimensional 
multivariative time series by “observing” every second model variable, 

»/ts(L) = (xi(f„),X3(f„),X5(f„), . ..XD-\{tn)) . (18) 

A common type of noise in experimentally observed time series is white noise which is given by normally distributed 
random numbers with a variance cr^ and a mean which is zero. Adding the noise to the clean time series we obtain a 
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noisy multivariate time series 


+ ritn), r(t„) = (ri(tn), r 2 (t„),rR(t„)) with r,(f„) ~ N(0, cr^) (19) 


which is typical for experiments with measurement noise. To quantify the power of the clean signal and the noise one 
can define the signal-to-noise-ratio (SNR in dB) for each time series as 


SNRiMQ}) = 10 ■ logio 


2„=0 ?7ts,i) 




( 20 ) 


(the overbar denotes the mean). The smaller the SNR the more measurement noise is present. 
In this example the observed time series is given by 


l/(f„) = (Xl(tn),X-i{t„),X5{tn), ■ . . XD-l(t„)) + r(0, 1.0) 


( 21 ) 


with a mean of the SNR of SNR({i/(f„))) - IjR- SNR({7;,(f„))) ^ 10.7dB. The measurement function is given by 

h(y(t),t) = (yi{t),y3(t),y5(t),.. ■ ( 22 ) 

Next, the model ( [T6| ) is adapted to {?/(?«)) and the model variables and the parameter are estimated from the time 
series. The weighting matrices of the cost function ( |5|l we re fixed to: A = lfl /2 {D/2 x D/2 unity matrix), B - Id, 
E - 10^ ■ Iz) and (i - 10^. As mentioned in Sectioti |2T| continuation with a is used here. This means that a was 
set to the following values: 0.9999,0.999,0.99,0.9,0.5. First, the cost function was optimized with a - 0.9999, then 
the resulting solution was used as the initial guess for the next optimization of the cost function with a - 0.999. This 
procedure was repeated until a - 0.5 is reached. The solution of the optimization problem with a - 0.5 was then 
considered as the solution of the estimation problem. The results of the estimated solution for the model variables are 
shown in Fig. 


(a) 


(b) 




(C) 


(d) 


observed variables 


unobserved variables 



t 


t 


Figure 1: Adaption of the 80 dimensional Lorenz96 model (Eq. jl6) ) to the time series {i]{tn)] (Eq. i |17^, g reen circles). Every second model 
variable was observed. Left-hand side: The model output given by the measurement function h{t) (Eq. \22\ , blue lines) was adapted to 
((a),(c)). (b) and (d) show the “true” solution x/ (red dashed lines) with the corresponding estimates y/ (blue lines),whereas for these quantities there 
exist measurements 7]yi. Right-hand side: (e), (f), (g) and (h) show estimates for the model variables y, (blue lines) and the “true” solutions Xj (red 
dashed lines). Note that for all cases shown on the right-hand side there exist no measurements. The model parameter is estimated to p = 8.165, 
whereas 8.17 is used for generating the data by GZ)- Note that only a few of the 80 estimated model variables (with and without data) are shown. 


One can see a good coincidence of the model variables y(t) on the one hand and the “trae” solution x{t) on the 
other hand. This is also the case for the observations {»/(?„)) and h(y{t)). Furthermore the modeling error u{t) is small. 
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5. Delay differential equations and the estimation of delay parameters 


An application where the cost function becomes rather complex and automatic differentiation turns out to be 
beneficial is the estimation of parameters and states of DDEs. How to derive the delayed variable y^(n) = y(n - k) 
from y{n) for t - k ■ At with k e N was already discussed in Section 2.1 In this case ^kiO,N) can be computed 
from d/(0, N) without interpolation. However, if A: e K+ this approach does not work anymore. One has to interpolate 
J/(0, A) to approximate A). Because At will be small a linear interpolation should be sufficient and hence is 
used here. 

In the general case we have t - (k' + 1) ■ At with t e K+, k' € N and / e [0,1 [. Note that, due to these restrictions 
on k' and /, k' and / are uniquely defined for a given t. Using this substitution for t the delayed variable can be written 


yk(n) = 3't(U) 

= y(tn - t) 

= yitn -{k' +1)- At) 

- y{n ■ At - {k' + 1) ■ At) 

= y((n -k' +1)- At) 

= y(n -k’ +1) (23) 


and with linear interpolation between yin - k') and yin — k' + 1) we obtain 


y^(n) = yin -k' + 1) 

, , yin-k' + l)-yin-k') 

— yin - K ) H-- ■ I - At 

- yin — k') + [yin - k' + \) - yin - k')] ■ I 


(24) 


withy^(n) e [y(n - k'),yin - k' + 1)] for n - 0,1,... ,N. Note, that for I - Owe have t - k' ■ At and hence the same 
situation as in Section 2.1 were y^(n) = yin - k') can be computed without an interpolation of the model variables. 

One possibility to estimate t is discussed now. First, r must be added in Eq. •EH to the quantities to be estimated. 
Next, bounds for r must be set in Eq. Q- This means that there are bounds to k' so that (if the smallest bound of 
T is zero) we have k' e [1, A]. If t and therefore k' and I would be fixed and not estimated, w (Eq. ( [T^ ) would be 
w — iyfi—k',N),p, q). When T will be estimated, the number of elements in the history yfi-k', -1) can vary during the 
optimization process due to variations in t (performed by the optimizer) and in k'. Because the number of elements of 
w has to stay constant during one optimization process, the history with the largest possible number of elements, given 
by d/(-A, -1), must be added to w, which then becomes w - iyfi-K,N),p,q,T). Estimating t with this approach 
has one major disadvantage. When the optimizer evaluates the cost function with a certain t, first k' and I have to 
be derived from t. Next k' is used as an index to define the intervals [y(n - k'),yin - k' + 1)] with n — 0,1,... A 
where to interpolate the model variables for deriving the delayed variables yj.(n) with n - 0,1,.. .N. The cost function 
depends on the model equations, the model equations depend on the delayed variable and the delayed variable is, as 
shown in Eq. a function of the model variables. Only the latter ones are quantities to be estimated. Hence the 
(sparse) Jacobian dHiiw)ldwj of Hiw) (see Eq. ( [T5] l) depends on the (sparse) Jacobian dyi,in) jdyin) of the delayed 
variables to the model variables. For any specific n there exist K possible intervals [y(n - k),yin - k + 1)] with 
k e [1, A] where the linear interpolation might be performed although there is only the one interval with k - k' 
where it finally will be performed. The derivative of the linear interpolation to the model variable can be written as 
dyk(n)ldyin) - dy(j.^i^^,in)ldyin) with k = 0,1,..., A. Since only for k - k' a linear interpolation is performed, the 
relevant elements of the Jacobian matrix corresponding tok + k are zero. Nevertheless, these elements which are zero 
at this step are not necessarily zero during the entire optimization process. Therefore these elements must be included 
in the sparsity pattern of nonzero elements, although, most of the time, they are zero. The sparsity pattern must not 
change during an optimization process. This fact would lead to a significant increase of (possible) nonzero elements 
in the Jacobian and hence would remove the advantage of dealing with a sparse Jacobian. 

To avoid these problems the estimation of the delay parameter is divided into two steps: 
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First step: T he co st function is minimized for several fixed delays, t - k - At with A: € N. In this case, as described in 
Section: 


2.1 


no interpolation is necessary for computing the delayed variable. The solution of this step is C(t). 
This means that the assigned value is the value of the cost function in its minimum for a given t. C(t) has a 
global minimum close to r^in = ■ At. 


Second step: From the first step we have values for the cost function only around the minimum at Tmin - At, Tmin 
and Tmin + At, as illustrated in Fig. The global minimum at f is either in the interval f e [rmin - At, Tmin] 
or in the interval t € [Tmin, Tmin + Af]. Now, t is estimated as well. As described at the beginning of this 
section, the delayed variable is computed by linear interpolation of the model variable. The cost function is 
minimized two times: once with t e [Tmin - Af,Tmin] and then with t e [Tmin, Tmin + Af]. The solution (model 
variables, parameters, delay time) obtained for the interval with the smallest value of the cost function after the 
optimization procedure is then taken as the final solution. 



T 


Figure 2: Estimation of the delay parameter: An illustration of the cost function C(r) around the minimum at T„,in is shown. The values of C(r) 
shown by the (blue) dots where obtained by the first step of estimating the delay parameter. The global minimum of C(t) at f is either in the 
interval f 6 [r^in " or in the interval f e [Tinin^'Tmin + as illustrated by the (red) dashed and continuous lines. Both lines describe 

possible behaviors of C(t) between the (blue) dots. To determine whether the global minimum is in [Tmin ” "rmin] or ['Tmin.'Tmin + both 
intervals are investigated separately in the second step using linear interpolation of the state variable. 


6. Example: Mackey-Glass model 


As an example of a DDE we consider the Mackey-Glass model 


dy(f) 


^ Pi ■ 


y(t - t) 


. This time delay model is given by 
10 P 2 y(t) (25) 


df 1 +y{t- t) 

and has the two model parameters pi, pi beside the delay parameter t. To generate a data time series a twin experiment 


is performed. This means that the model (similar to Eq. 


dx{t) 


= 2 ' 


x(f-2.38) 


1 ■ x{f) 


df 1 H-x(f-2.38)10 

was integrated first. The solution is then used to generate the (noisy) time series 

rjitn) = x(tn) + N(0,0.l) 

with f„ e (0,0.1,..., 60) and N — 601 with timesteps Af = 0.1 (SNR = 8.0dB according to Eq. (|20ll). 
Next the model was adapted to {ri(t„)} using the measurement function 


(26) 


(27) 


%(0,«,«) = y(f)- (28) 

The parameters pi, p 2 and t were estimated in addition to the model variable. As described in Section|^the estimation 
of T is divided into two steps: 


8 










(Pl,P2) 

T 

C(...,T) 

data 

(2,1) 

2.38 


estim. T e [2.2,2.3] 

(1.73,0.86) 

2.300 

o 

1 

estim. T 6 [2.3,2.4] 

(1.98,0.99) 

2.387 

4.99 ■ 10-3 


Table 2: Model parameters pi, p 2 and delay parameter r of the Mackey-Glass model i |25) estimated by adapting the model ( |25) to the time 
series given by Eq. ig. According to Fig. Islthe cost function C{^(—k,N),p,»,r) has its global minimum either in r £ [2.2,2.3] or in r £ 
[2.3,2.4]. Minimizing the cost function and estimating t in addition to the model parameters and variables for both intervals results in a smaller 
C(Sf(-k,N\p,»,T)foxTe [2.3,2.4]. 


• First T is fixed to several different values t - k ■ At with k - 3,4,..., 100 and At — 0.1. For each fixed t the 

cost function is minimized. The value of the cost function at its minimum is denoted by N), p, •, t). Its 

dependence on t is shown in Fig. One can see a clear minimum at t = 2.3. Due to the step size of At = 0.1 
and the smoothness around this minimum one can expect that the global minimum is either at t 6 [2.2,2.3] or 
at re [2.3,2.4]. 

• Second the cost function is minimized two times (hrst with the bounds t e [2.2,2.3] and second with t e 
[2.3,2.4]). In each case the delayed variable is approximated by linear interpolation according to Eq. 
Remember that due to the bounds the index k' does not change during each optimization process and hence 
does not have to be recomputed from r in each iteration. Only / changes. 

For all performed estimation processes the weighting matrices of the cost function (|^ (in this cases they are scalar) 
and a were hxed to: A - B - \, E - - 10^ and a - 0.5. The results for the estimated parameters and the delay 

parameter are shown in tab. 



Figure 3: Cost function for adaption of the Mackey-Glass (251 model to a time series generated in a twin experiment with Eq. Ilz)- 

For T e [2.3,2.4] the cost function C(..., t) at its minimum has a smaller value than for t 6 [2.2,2.3]. Hence the 
solution for r e [2.3,2.4] was chosen as the hnal result. Furthermore the estimated values for p\, p 2 and r coincide 
much better with the “true” values used to generate (77(f„)] (compared to the estimated values with t 6 [2.2,2.3]). The 
estimated model variable for t 6 [2.3,2.4] is shown in Fig. One can see a good coincidence of the model variable 
y(t) on the one hand and the “true” solution x(t) (Fig. |^) and the data (? 7 (f„)] (Fig. |^) on the other hand with a small 
modeling error u(t) (Fig. |^). 

7. Conclusion 

Many optimization based system identihcation methods require information about derivatives of the underlying 
cost function in order to converge to the desired optimum. In general this information has to be provided by the user 


9 






(a) 




(b) 


1.6 

1.4 

1.2 

1.0 

0.8 

0.6 

0.4 

0.2 

0.0 

1.2 

1.0 

0.8 

0.6 

0.4 


(C) 


0.5 

0.0 


0 


10 20 30 40 50 60 

t 



Figure 4: Adaption of the Mackey-Glass ^25^ model to the time series t] (see <27) green circles) which was generated by the original “true” solution 
X red dashed line,) unknown to the estimation algorithm) using the measurement function h(y{t), •, •) (blue line, see The estimated solution 

for the model variable is y (blue line in (b)) and u (black line in (c)) is the error in the approximation of the model equation (see The delay 
time r £ [2.3,2.4] was also estimated beside the model parameters and all estimated values are shown in tah.|^ 


(or programmer) in terms of a Jacobian matrix, for example. Our examples show that this (typically cumbersome) task 
can be conveniently handled by means of automatic differentiation. This versatile tool from applied mathematics and 
computer science not only gives exact numerical values of the required derivatives, but also provides (and respects) 
the sparsity structure of the Jacobian matrix which may be exploited by any calling algorithm. These features have 
been demonstrated with a particular optimization algorithm (SparseLM) that enabled parameter and state estimation 
of the high dimensional Lorenz-96 system and the Mackey-Glass delay differential equation. A challenge for future 
research will be, for example, a successful application of the proposed parameter estimation to electrophysiological 
models of cardiac myocytes. 

Our implementation (Python, C) of the estimation algorithm and a Python wrapper for sparseLM are available for 
download ||32l. Other possible fields of application in nonlinear dynamics are bifurcation analysis and the computation 
of Lyapunov exponents and (covariant) Lyapunov vectors |[33l. 
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Appendix A. Python example for automatic differentiation using ADOL-C 

For illustration we present and discuss a Python example showing how to derive the sparse Jacobian of a function 
H{w) in Listing[T] 
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Listing 1: Simple Python example demonstrating the derivation of the (sparse) Jacobian of the function H(w) usind the automatic differentiation 
tool ADOL-C. 


1 import numpy as np 

2 import adolc 

3 

4 def H(w): 

5 h = np . zeros(3,dtype=w.dtype) 

6 h[0] = 3*w[0]**2 + w[l]*w[3] 

7 h[l] = 4*w[2]**3 

8 h[2] = 5*w[0] + 2*np.exp(np.sin(w[1]*w[2])) 

9 return h 

10 

11 wO = np.array([4.,2.,3.,l.]) 

12 

13 # Trace H(w) 

14 # - 

15 adolc . trace.on (0) 

16 aw = adolc . adouble (wO) 

17 adolc . independent (aw) 

18 

19 Hfunc = H(aw) 

20 

21 adolc . dependent (Hfunc ) 

22 adolc. trace.off () 

23 

24# Evaluate function jacohian at this point 

25 w = np.array([-2.,3.,l.,-4.]) 

26 

21 # Evaluate function 

28 valH = adolc . function (0 ,w) 

29 

30 # Evaluate dense jacohian 

31 jacH = adolc . Jacobian (0 ,w) 

32 # Out JacH: 

33# [[-12. -4. 0. 3.[ 

34# [ 0. 0. 12. 0.[ 

35 # [ 5. -2.280 -6.840 O.J] 

36 

37 # Derive sparsity pattern and evaluate values of nonzeros at wO 

38 #- 

39 options = [0, 0, 0, 0] 

40 [ nnz , rind, cind, val] = adolc.colpack.sparse _jac_no_repeat(0 , wO .options) 

41 # Out nnz (number non-zeros ): 7 
42# Out rind (row indices): 

43# [0 0 0 1 2 2 2] 

44# Out cind (column indices ): 

45 # [0 1 3 2 0 1 2] 

46 # Out val ( values }: 

A1 # [ 24. 1. 2. 108. 5. 4.356 2.904] 

48 

49# Evaluate values of nonzeros at w using nnz, 

50# rind, cind and valH from the previous step 

51 #-^- 

52 [ nnz , rind, cind, val] = adolc.colpack. sparse _jac_repeat(0,w, nnz , rind,cind,val) 

53 # Out nnz (number non-zeros): 7 
54# Out rind (row indices): 

55# [0 0 0 1 2 2 2] 

56# Out cind (column indices ): 

57 # [0 1 3 2 0 1 2] 

58 # Out val (values): 

59 It ]-12. -4. 3. 12. 5. -2.280 -6.840] 
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Before ADOLC can compute the numerical values of the Jacobian data type an internal function representation of 
H(w), called trace, is created in lines 15-22. To do this the data type of the elements of the input vector is changed to 
the ADOL-C data type ’adouble’ (line 16). Many (mathematical) functions are overloaded for this data type and can 
hence be used in the function to be differentiated. FOR-loops, WHILE-loops, IF ... THEN statements, etc. are also 
allowed under certain conditions. In lines 28 and 31 the function H(w) and its dense Jacobian are computed using 
the previously created trace for a new w. In line 40 the sparsity pattern of the Jacobian is detected and used in line 
52 to compute the numerical values of its non-zero elements. The output in lines 33-35 and lines 55-59 show that the 
computed dense and sparse Jacobians are equal for the same w. 
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